latent weight
Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps. In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs. We subsequently introduce the first optimizer specifically designed for BNNs, Binary Optimizer (Bop), and demonstrate its performance on CIFAR-10 and ImageNet. Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies for BNNs.
BSO: Binary Spiking Online Optimization Algorithm
Liang, Yu, Yang, Yu, Wei, Wenjie, Belatreche, Ammar, Wang, Shuai, Zhang, Malu, Yang, Yang
Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https://github.com/hamings1/BSO.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Canada (0.04)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
Latent Attention For If-Then Program Synthesis
Chang Liu, Xinyun Chen, Eui Chul Shin, Mingcheng Chen, Dawn Song
Automatic translation from natural language descriptions into programs is a longstanding challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-to-end. Specifically, we introduce Latent Attention, which computes multiplicative weights for the words in the description in a two-stage process with the goal of better leveraging the natural language structures that indicate the relevant parts for predicting program elements. Our architecture reduces the error rate by 28.
Reviews: Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
This paper addresses the optimization for BNN and provides a novel latent-free optimizer for BNN, which challenges the existing way of using later-weights. This is an interesting and original idea. Specifically, one common way to see BNN training is to view the binary weights as an approximation to real-valued weight vector, this paper argues that the latent weights used in the previous methods are in fact not weights. The paper argues this by introducing a concept of inertia. Motivated from this new insight, one novel optimizer called Bop is introduced.
Reviews: Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
This paper proposed a new training method for neural networks with binary weights. The main idea is to not use the existing "latent weights approach" which treats the weights as continuous, rather a new method that relies on the sign of the weights. The proposed approach is based on momentum. Before rebuttal, the authors found the paper to be original, novel, and also simpler than existing methods. They had some concerns regarding the experiments and also a few other small concerns.
Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps. In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs. We subsequently introduce the first optimizer specifically designed for BNNs, Binary Optimizer (Bop), and demonstrate its performance on CIFAR-10 and ImageNet.
Latent Attention For If-Then Program Synthesis
Automatic translation from natural language descriptions into programs is a longstanding challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-toend. Specifically, we introduce Latent Attention, which computes multiplicative weights for the words in the description in a two-stage process with the goal of better leveraging the natural language structures that indicate the relevant parts for predicting program elements. Our architecture reduces the error rate by 28.57% compared to prior art [3]. We also propose a one-shot learning scenario of If-Then program synthesis and simulate it with our existing dataset. We demonstrate a variation on the training procedure for this scenario that outperforms the original procedure, significantly closing the gap to the model trained with all data.
- North America > United States (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Contextual Knowledge Learning For Dialogue Generation
Zheng, Wen, Milic-Frayling, Natasa, Zhou, Ke
Incorporating conversational context and knowledge into dialogue generation models has been essential for improving the quality of the generated responses. The context, comprising utterances from previous dialogue exchanges, is used as a source of content for response generation and as a means of selecting external knowledge. However, to avoid introducing irrelevant content, it is key to enable fine-grained scoring of context and knowledge. In this paper, we present a novel approach to context and knowledge weighting as an integral part of model training. We guide the model training through a Contextual Knowledge Learning (CKL) process which involves Latent Vectors for context and knowledge, respectively. CKL Latent Vectors capture the relationship between context, knowledge, and responses through weak supervision and enable differential weighting of context utterances and knowledge sentences during the training process. Experiments with two standard datasets and human evaluation demonstrate that CKL leads to a significant improvement compared with the performance of six strong baseline models and shows robustness with regard to reduced sizes of training sets.
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.05)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
- North America > United States > Pennsylvania (0.04)
- (2 more...)